{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A simple demo for the OpenHowNet Python Package\n",
    "\n",
    "To begin with, make sure you have installed **Python 3.X**. \n",
    "\n",
    "Also, the [**anytree**](https://pypi.org/project/anytree/) is required to be installed, which is the only dependency for OpenHowNet.\n",
    "\n",
    "Next, you should follow the [instruction](https://github.com/thunlp/OpenHowNet#installation) to install **OpenHowNet** API. \n",
    "\n",
    "After that, you can import the module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the OpenHowNet module\n",
    "import OpenHowNet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can create a **HowNetDict** object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize HowNetDict, you can initialize the similarity calculation module by setting the init_sim to True.\n",
    "hownet_dict = OpenHowNet.HowNetDict(init_sim=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the preparation work is all done. Let's explore some important features of HowNetDict."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Usage of OpenHowNet\n",
    "\n",
    "### Get word annotations in HowNet\n",
    "\n",
    "By default, the api will search the target word in both English and Chinese annotations in HowNet, which will cause significant search overhead. Note that if the target word does not exist in HowNet annotation, this api will simply return an empty list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the senses list annotated with \"苹果\".\n",
    "result_list = hownet_dict.get_sense(\"苹果\")\n",
    "print(\"The number of retrievals: \", len(result_list))\n",
    "print(\"An example of retrievals: \", result_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In OpenHowNet package, the detailed information of senses and sememes in HowNet are wrapped into classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the detailed information of the sense.\n",
    "sense_example = result_list[0]\n",
    "print(\"Sense example:\", sense_example)\n",
    "print(\"Sense id: \",sense_example.No)\n",
    "print(\"English word in the sense: \", sense_example.en_word)\n",
    "print(\"Chinese word in the sense: \", sense_example.zh_word)\n",
    "print(\"HowNet annotation of the sense: \", sense_example.Def)\n",
    "print(\"Sememe list of the sense: \", sense_example.get_sememe_list())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the detailed information of the sememe.\n",
    "sememe_example = sense_example.get_sememe_list().pop()\n",
    "print(\"Sememe example: \", sememe_example)\n",
    "print(\"The English annotation of the sememe: \", sememe_example.en)\n",
    "print(\"The Chinese annotation of the sememe: \", sememe_example.zh)\n",
    "print(\"The frequency of occurrence of the sememe in HowNet: \", sememe_example.freq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can visualize the retrieved HowNet structured annotations (\"sememe tree\") of sense as follow :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sense_example.visualize_sememe_tree()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides, you can get the Sememe instance list by the English annotation or Chinese annotation. Similarily, you can set the language of the input or set the `strict` to `False` to fuzzy match the sememe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sememe1 = hownet_dict.get_sememe('FormValue', language='en')\n",
    "sememe2 = hownet_dict.get_sememe('圆', language='zh')\n",
    "print(\"Retrieved sememes: \",sememe1, sememe2)\n",
    "\n",
    "sememe3 = hownet_dict.get_sememe('值', strict=False)\n",
    "print(\"Fuzzy match the sememes (retrieved {} results): \".format(len(sememe3)), sememe3[:5])\n",
    "\n",
    "sememe_all = hownet_dict.get_all_sememes()\n",
    "print(\"There are {} sememes in HowNet in total.\".format(len(sememe_all)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To boost the efficiency of the search process, you can specify the language of the target word as the following."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The number of mixed search results:\",len(hownet_dict.get_sense(\"X\")))\n",
    "print(\"The number of Chinese results:\",len(hownet_dict.get_sense(\"X\",language=\"zh\")))\n",
    "print(\"The number of English results:\",len(hownet_dict.get_sense(\"X\",language=\"en\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can limit the POS of the target word by setting the `pos`.  Besides, you can set the `strict` to false to make a fuzzy match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "res = hownet_dict.get_sense(\"苹果\", strict=False)\n",
    "print(\"Fuzzy match: (The number of retrievals: {})\".format(len(res)))\n",
    "print(\"Retrivals: {}\\n\".format(res))\n",
    "res = hownet_dict.get_sense(\"苹果\",pos='adj', strict=False)\n",
    "print(\"Fuzzy match and limit the POS to adj: (The number of retrievals: {})\".format(len(res)))\n",
    "print(\"Retrivals: {}\".format(res))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can get all senses by using the follow API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_senses = hownet_dict.get_all_senses()\n",
    "print(\"The number of all senses: {}\".format(len(all_senses)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides, you can also get all the English or Chinese words in HowNet annotations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "zh_word_list = hownet_dict.get_zh_words()\n",
    "en_word_list = hownet_dict.get_en_words()\n",
    "print(\"Chinese words in HowNet: \",zh_word_list[:10])\n",
    "print(\"English words in HowNet: \",en_word_list[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get sememe trees of a certain word in HowNet¶\n",
    "\n",
    "You can get the sememes by certain word in a variety of forms of presentation. Detailed explanation of params will be displayed in our documentation.\n",
    "First, you can retrieve all the senses annotated with the certain word and their sememes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the respective sememe list of the senses annotated with the word.\n",
    "# The word can be English or Chinese or *\n",
    "hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=False, expanded_layer=-1, K=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `display` can be set to \"tree\"/\"dict\"/\"list\"/\"visual\", and the function will return in different forms.\n",
    "1. When set to \"list\", the sememes will be returned in the form of list as shown above.\n",
    "2. When set to \"dict\", the function will return the sememe tree in the form of dict as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict.get_sememes_by_word(word = '苹果', display='dict', merge=False, expanded_layer=-1, K=None)[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. When set to \"tree\", the function will return the senses and the root node of their respective sememe tree. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = hownet_dict.get_sememes_by_word(word = '苹果', display='tree', merge=False, expanded_layer=-1, K=None)[0]\n",
    "print(t)\n",
    "print(\"The type of the root node is:\", type(t['sememes']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. When set to \"visual\", the function will visualize the Top-K sememe trees. At this point, `K` can be set to control the num of the visualized sememe trees. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict.get_sememes_by_word(word = '苹果', display='visual', merge=False, expanded_layer=-1, K=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. `merge` and `expanded_layer` only work when `display==\"list\"`. When `merge==True`, the sememe lists of all the senses retrieved by the word will be merged into one. `expanded_layer` is set to control the expanded layer num of the sememe tree and by default it will be set to -1(expanded all layers)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Expand all layers and merge all the sememe list into one\n",
    "hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=-1, K=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Expand the top2 layers and merge all the sememe list into one. Note that the first layer is the sense node. \n",
    "hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=2, K=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get sememes via relations between sememes\n",
    "\n",
    "There are various relations between sememes as follows. The package provides api to retrieve related sememes.\n",
    "You can retrieve the relation between two sememes by the annotation of the sememe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_sememe_relations = hownet_dict.get_all_sememe_relations()\n",
    "print(all_sememe_relations)\n",
    "# Get the relation between sememes. Please pay attention to the order of the sememes.\n",
    "relations = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=False)\n",
    "print(relations)\n",
    "# You can get the triples in the form of (head_sememe, relation, tail_relation)\n",
    "triples = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=True)\n",
    "print(triples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want sememes that have the exact relation with some sememe, you can do as below. Note that you can also get triples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "triples = hownet_dict.get_related_sememes('FormValue', relation='hyponym',return_triples=True)\n",
    "print(triples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides, you can get related sememes directly by the sememe instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Take {} as example.\".format(sememe1[0]))\n",
    "print(\"The sememes that have the relaiton of hyponym with the sememe are:\")\n",
    "print(sememe1[0].get_related_sememes(relation='hyponym'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Moreover, you can get all the sememes that have relation with the exact sememe (ignore the order)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The sememes that have relaiton with the sememe {} are:\".format(sememe1[0]))\n",
    "print(sememe1[0].get_related_sememes())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Feature #1: Word Similarity Calculation via Sememes\n",
    "\n",
    "The following parts are mainly implemented by Jun Yan and integrated by Chenghao Yang. Our implementation is based on the paper:\n",
    "> Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP\n",
    "\n",
    "### Extra initializaiton\n",
    "Because there are some files required to be loaded for similarity calculation. The initialization overhead will be larger than before. To begin with, you can initialize the hownet_dict object as the following code :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict_anvanced = OpenHowNet.HowNetDict(init_sim=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also postpone the initialization work of similarity calculation until use. The following code serves as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict.initialize_similarity_calculation()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get senses that have the same sememe list\n",
    "You can retrieve the senses that have the same sememe list with the exact sense. Note that the structured information is ignored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Take sense {} as an example. Its sememes contains: \".format(sense_example))\n",
    "print(sense_example.get_sememe_list())\n",
    "print(\"Senses that have the same sememe list contains: \")\n",
    "print(hownet_dict_anvanced.get_sense_synonyms(sense_example)[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get top-K nearest words of the given word\n",
    "Given an exact word, the function will return the Top-K nearest words in HowNet.\n",
    "First of all, the HowNetDict will match the senses in HowNet by the word and give the nearest words separately.\n",
    "Note that you must set the language of the words, and the calculation may takes a long time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can get the similarity score as below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5,score=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By setting the `merge` to True, you can merge the words list of senses into one and get the Top-K words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5, merge=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Detailed explanation of params will be displayed in our documentation.\n",
    "\n",
    "### Calculate the similarity between two given words¶\n",
    "If any of the given words does not exist in HowNet annotations, this function will return -1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('The similarity of 苹果 and 梨 is {}.'.format(hownet_dict_anvanced.calculate_word_similarity('苹果','梨')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Feature #2: BabelNet Synset Search\n",
    "\n",
    "### Extra initializaiton\n",
    "Because there are more files required to be loaded for BabelNet dict. The initialization overhead will be larger than before. You can initialize the hownet_dict object as the following code :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict_anvanced = OpenHowNet.HowNetDict(init_babel=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can use the following API to initialize the BabelNet dict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hownet_dict.initialize_babelnet_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can retrieve a synset instance and get the abundant information in it using the follow APIs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "syn_list = hownet_dict_anvanced.get_synset('黄色')\n",
    "print(\"{} results are retrieved and take the first one as an example\".format(len(syn_list)))\n",
    "syn_example = syn_list[0]\n",
    "print(\"Synset: {}\".format(syn_example))\n",
    "print(\"English synonyms: {}\".format(syn_example.en_synonyms))\n",
    "print(\"Chinese synonyms: {}\".format(syn_example.zh_synonyms))\n",
    "print(\"English glosses: {}\".format(syn_example.en_glosses))\n",
    "print(\"Chinese glosses: {}\".format(syn_example.zh_glosses))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also get all the synsets and relations between synsets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_synsets = hownet_dict_anvanced.get_all_babel_synsets()\n",
    "all_synset_relation = hownet_dict_anvanced.get_all_synset_relations()\n",
    "print(\"There are {} synsets and {} relations\".format(len(all_synsets),len(all_synset_relation)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, you can search for the synsets that have the exact relation with the synset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "related_synsets = syn_example.get_related_synsets()\n",
    "print(\"There are {} synsets that have relation with the {}, they are: \".format(len(related_synsets), syn_example))\n",
    "print(related_synsets[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The package also provides search for the sememe list by the BabelNet sememe annotations.\n",
    "The API is similar with the HowNet APIs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(hownet_dict_anvanced.get_sememes_by_word_in_BabelNet('黄色'))\n",
    "print(hownet_dict_anvanced.get_sememes_by_word_in_BabelNet('黄色',merge=True))"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
